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THE  MEASUREMENT  OF  CONFIDENCE  AND  TRUST 


Abstract 

This  report  is  concerned  with  the  development  of  a  research  methodology 
and  a  theoretical  framework  for  investigating  the  effects  of  social  influence 
in  a  simple  judgmental  situation.  The  laboratory  task  entails  a  simple 
binary  judgment  as  to  whether  a  displayed  angle  departs  from  90°;  before 
making  his  own  response  the  subject  is  provided  with  the  answer  of  a 
hypothetical  partner,  programmed  at  a  certain  fixed  accuracy  level.  The 
responses  are  made  in  terms  of  a  special  betting  scheme  which  penalizes 
the  subject  for  overstating  or  understating  his  confidence.  The  two  main 
experimental  variables  in  this  study  are  the  difficulty  of  the  discrimination 
and  the  announced  reliability  of  the  hypothetical  partner.  Theoretical 
predictions  as  to  the  effects  of  these  variables  on  the  relative  value  of 
confidence  measures  are  confirmed.  However,  further  methodological 
development  is  required  to  increase  the  realism  of  subjects'  confidence 
scores. 


i  i  i 


Table  of  Contents 


Page 

Introduction  1 

Theoretical  Framework  2 

Experimental  Procedure  8 

Results  11 

Summary  16 

Appendix  (Instructions)  17 

References  27 


i 


IV 


Tables 


Page 

1.  Correspondence  between  Confidence  Measures  and  Payoff 

Values  19 

2.  Experimental  Design  20 

3.  Accuracy  and  Confidence  Scores  under  Various  Experi¬ 
mental  Conditions  before  Introduction  of  Pony  21 

4.  Detection  Accuracy  at  Various  Levels  of  Confidence  under 

Differing  Experimental  Conditions  22 

5.  Mean  Accuracy  of  Composite  Judgments  under  Various 

Experimental  Conditions  23 

6.  Mean  Confidence  (Corrected  to  Probability)  under  Various 

Experimental  and  Event  Conditions  24 

7.  Mean  Estimated  Confidence  (c)  and  Trust  (t)  under  Various 

Experimental  Conditions  25 

8.  Variance  Associated  with  Selected  Combinations  of  Experi¬ 
mental  and  Event  Conditions  26 


v 


The  Measurement  of  Confidence  and  Trust 


T.  B.  Roby 
Tufts  University 

Teresa  Carterette 
Simmons  College 


Introduction 

The  effects  of  social  influence  upon  perception  and  behavior  consti¬ 
tute  an  important  and  rewarding  subject  for  social  psychological  research. 
The  experimental  literature  contains  a  number  of  dramatic  and  reproducible 
demonstrations  of  the  fact  that  such  effects  occur  (e.  g.  Sherif,  Asch).  In 
order  to  go  beyond  these  pioneer  studies  however,  it  will  be  necessary  to 
obtain  precise  information  as  to  the  factors  that  determine  the  extent  of  the 
influence  effects.  This  objective  entails  extensive  methodological  develop¬ 
ment  both  in  the  specification  of  experimental  variables  to  be  independently 
manipulated  and  in  the  measurement  of  dependent  variables  indicating  in¬ 
fluence. 

The  present  line  of  investigation  grew  out  of  exploratory  studies  of 
the  acquisition  and  exchange  of  information  in  small  group  performance 
(Roby,  Harleston,  and  Eyde,  1961;  Farrell,  Nicol  and  Roby,  1961).  During 
the  course  of  these  investigations  it  became  increasingly  clear  that  any 
thorough  explication  of  the  overall  group  process  of  information  acquisition 
depended  on  understanding  the  way  in  which  a  given  team  member  reacted 
to  information  he  received  from  other  team  members  --  particularly  in 
comparison  with  his  reaction  to  directly  observed  information. 

The  specific  methodological  needs  that  were  pointed  up  included: 

1.  A  theoretical  framework  which  identifies  explicitly  the  bases 

for  a  subjects  confidence  in  his  own  judgment  and  his  trust  in  the  judgments 
he  obtains  vicariously. 

2.  An  experimental  paradigm  in  which  these  factors  can  be  systema¬ 
tically  manipulated  or  controlled. 

3.  A  theoretical  analysis  of  the  dynamic  interplay  between  confidence 
and  trust,  and  the  way  in  which  this  may  be  reflected  in  overt  behavior. 

4.  A  measurement  procedure  that  affords  direct  evaluation  of  the 
net  confidence  that  a  subject  places  in  a  judgment  that  is  based  upon  his 
own  opinion  and  that  of  a  real  or  fictitious  partner. 

The  present  theoretical  approach  is  adapted  from  a  more  compre- 
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hensive  treatment  of  epistemic  processes  discussed  in  an  earlier  report 
(Roby,  in  press).  The  basis  for  the  approach  lies  in  the  now  familiar 
concept  of  ''subjective  probability"  but  it  employs  this  concept  in  a  rather 
special  form.  Very  briefly,  it  is  assumed  that  subjective  probability  es¬ 
timates  may  be  regarded  as  forming  a  vector  entity  (referred  to  as  a 
'belief  state')  which  undergoes  certain  specified  transformations  under  the 
influence  of  particular  items  of  external  evidence. 

The  major  hurdle  in  experimental  applications  of  this  theoretical 
framework  has  been  the  development  of  a  feasible  measure  of  confidence 
and  trust  --  both  of  which  are  regarded  as  probabilistic  entities.  To  show 
the  relation  between  theory  and  experiment  it  will  be  necessary  to  describe 
the  former  in  some  detail,  even  though  the  present  study  does  not  provide 
direct  tests  of  all  of  the  conjectures  here  advanced. 


Theoretical  Framework 

The  experimental  situation  that  is  investigated  is  one  in  which  the 
subject  makes  a  series  of  binary  psychophysical  judgments  with  the  aid  of 
a  simulated  partner  or  "pony,  "  P.  The  difficulty  of  the  judgments  is  con¬ 
trolled  by  varying  signal  difficulty  and  the  reliability  of  P  is  also  varied 
in  different  experimental  conditions.  The  questions  of  chief  experimental 
interest  are  how  the  subject's  confidence  in  his  independent  judgment,  and 
trust  in  P's  judgment  vary  with  experimental  conditions,  and  how  confi¬ 
dence  and  trust  interact  in  composite  judgments. 

It  is  assumed  that,  on  each  judgment  trial,  the  subject  will  experi¬ 
ence  some  sensory  correlate  of  the  external  display  which  will  take  the 
form  $  (yes)  or  ft  (no).  The  subject's  confidence  is  defined  as  the  proba¬ 
bility  he  assigns  to  the  event  that  the  external  signal,  Y  or  N,  actually 
corresponds  to  the  sensory  correlate.  For  simplicity,  and  without  serious 
loss  of  generality,  it  will  here  be  supposed  that  the  probability  of  Y  and 
N  signals,  and  the  associated  probabilities  of  Y  and  N  sensations,  are  sym¬ 
metric.  Then  confidence  is  defined  by 

A  A  A  A 

c  =  P(Y/Y)  =  P(N/N) 

(The  probability  values  are  also  identified  as  subjective  by  the  circumflex 
notation.  ) 

Initial  confidence  and  signal  difficulty 

Upon  initial  exposure  to  a  particular  judgmental  task,  the  subject 
does  not  have  any  very  firm  basis  for  estimating  his  appropriate  confidence 
level.  However,  he  does  have  two  sources  of  evidence  to  go  on:  the  first 
is  his  general  success  in  judgmental  tasks  of  the  same  class;  and  the 
second  is  the  internal  (central  nervous  system)  distinctiveness  of  the  dif- 
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ference  between  Y  and  N.  Thus  it  seems  quite  possible  that  the  subject 
can  make  a  rough  scaling  of  the  distinctiveness  of  $  versus  A  and  apply 
to  this  an  estimate  of  past  judgmental  success  in  dealing  with  discrimina¬ 
tions  of  similar  difficulty. 

A  A 

In  an  experimental  situation  the  distinctiveness  of  Y  or  N  can  be 
manipulated  by  varying  signal  difficulty,  and  the  generalized  confidence 
of  the  subject  can  be  modified  by  controlling  his  success  or  failure  ex¬ 
periences.  It  is  assumed  that  these  will  interact  in  a  multiplicative  fash¬ 
ion  to  produce  an  initial  confidence  value  cQ  that  is  attached  to  the  earli¬ 
est  trial  on  a  new  judgmental  task.  No  precise  quantitative  model  will  be 
suggested  for  cQ,  however,  as  it  is  not  of  direct  relevance  to  the  experi¬ 
mental  situation  here  considered. 

Modification  of  c  with  feedback 


If  the  subject  is  repeatedly  exposed  to  the  judgmental  task,  re¬ 
ceives  an  internal  sensation  $  or  N,  and  has  these  impressions  confirmed 
or  infirmed  by  trustworthy  feedback,  his  confidence  will  presumably  be 
modified  to  conform  to  his  success.  Here  it  will  be  assumed  that  this 
change  in  confidence  from  trial  to  trial  is  described  by^a  simple  operator 
function.  For  trials  on  which  the  internal  impression  Y  or  N  is  confirmed, 
this  takes  the  form, 

cn*l  ”  cn  *X^~cn)* 

For  those  trials  on  which  the  internal  impression  is  infirmed  by 
later  feedback,  the  effect  is  described  by, 

^  cn+l  ”  cn“/Xcn- 

That  is,  it  is  assumed  that  the  effect  in  either  case  is  to  raise  or  lower 
the  confidence  by  a  fixed  proportion,  y(  ,  of  the  possible  change  in  either 
direction. 

Next,  the  assumption  is  made  that  the  general  expression  for  all 
trials  will  beyweighted  combination  of  the  respective  effects  on  success 
and  failure  trials.  For  veridical  feedback,  there  will  be  a  certain  pro¬ 
portion  d  of  successful  trials,  and  (1-d)  of  unsuccessful  trials.  The  pro¬ 
portion  d  is  a  direct  measure  of  the  subject's  accuracy  in  making  the 
discrimination.  Weighting  equations  1  and  2  by  these  proportions,  there 
results, 


3)  cn+1  =  d|Tcn+/^(l-cn2|  +  (1-d)  (1-/)  cn  =  cn4-/  (d-cn) 

For  the  present,  interest  attaches  primarily  to  the  steadystate  value  of 
cn,  --  the  value  at  which  cn+l  =  £n*  By  direct  substitution  in  equation  3 
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this  is  seen  to  occur  at  the  point  cn  =  d.  That  is,  the  subjects  confi¬ 
dence  will  asymptotically  approach  his  accuracy  of  discrimination. 

Joint  confidence  and  trust  development 


A  supplementary  partner  judgment  received  by  the  subject  may 
affect  his  confidence  in  the  judgment  on  any  particular  trial  as  well  as 
his  overall  confidence  level.  The  latter  effect  is  considered  first. 

There  are  four  distinct  cases:  (a)  The  subject  receives  veridical 
feedback  to  evaluate  both  his  own  judgment  and  that  of  the  partner;  (b) 

The  subject  receives  veridical  feedback  on  his  own  judgment  but  not  on  his 
partner* s  (implying  that  feedback  is  not  received  on  trials  for  which  the 
partner*s  judgment  is  available);  (c)  The  subject  receives  veridical 
feedback  on  his  partner* s  judgment  but  not  on  his  own  (for  the  converse 
reason);  and  (d)  no  feedback  is  received  on  either  his  own  or  the  partner’s 
judgment. 

For  Case  (a)  it  is  assumed  that  trust  in  the  partner’s  judgment 
will  follow  essentially  the  same  operational  formula  as  described  by  equa¬ 
tion  3.  If  r  is  the  partner’s  reliability  of  judgment  and  t  is  the  trust  in¬ 
vested  in  the  partner,  then 

4)  fcn+l  =  tn  +/(r-tn) 

The  asymptotic  value  for  tn  is  of  course  r,  the  partner's  reliabili¬ 
ty. 


In  Case  (b)  the  subject  has  a  firm  basis  for  estimating  his  own  ac¬ 
curacy,  but  can  evaluate  his  partner  only  by  the  latter's  agreement  with 
his  own  judgment  on  no-feedback  trials.  It  is  assumed  that,  on  such  trials, 
the  subject's  trust  in  his  partner  is  modified  by  an  operator  similar  to 
those  employed  above  but  weighted  by  a  proportionality  factor  depending 
on  the  subject's  own  self-confidence.  Specifically,  it  is  hypothesized  that, 
for  those  trials  on  which  S  and  P  agree, 

5)  tn+1  =  ^n+X  (cn-.  5)  (l"*n) 

For  trials  on  which  they  disagree, 

6)  tn+1  =  ( cn  -.5)  tn 

If  S  and  P  agree  on  a  proportion  g  of  those  trials  for  which  P's  judgment 
is  available,  the  equations  are  combined  as  before,  yielding 

7)  tn+1  =  Ml-/  <c„-  5))  +  g/(cn-.5) 
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Considering  only  asymptotic  values  of  tn,  there  are  several  possibilities 
for  this  equation.  If  c ^  goes  to  .  5,  subsequent  values  of  t_  remain  sta¬ 
tionary.  This  outcome  is  of  limited  interest.  If  cn  reaches  any  asymptotic 
value  above  .  5  then  implies  tn  =  g.  That  is,  the  trust  will  be¬ 

come  identical  with  the  rate  of  agreement  between  the  subject  and  P. 

Case  (c)  is,  of  course,  the  same,  mutatis  mutandis,  as  Case  (b). 

In  Case  (d),  with  no  feedback  at  all,  there  is  still  the  possibility 
of  a  sort  of  bootstrap  reinforcement  of  confidence  and  trust.  This  will  be 
based  on  the  inference  by  S  that  he  could  agree  with  the  partner  on  more 
than  a  chance  number  of  trials  only  if  they  were  both  performing  at  better 
than  chance  accuracy.  Implicit,  too,  are  the  assumptions  that  their  judg¬ 
ments  are  initially  independent  and  that  their  responses  depart  from  chance 
in  the  correct  direction.  We  assume  then  that  the  equations  for  Cases  b 
and  c  hold  simultaneously.  Rearranging  them  slightly,  there  results 

^n+l  =  ^n  +X  (cn~* 

9)  cn+i  =  cn  +  /(tn_.  5)  (g-cn) 

Subtracting  the  second  equation  from  the  first  and  rearranging  terms, 

10)  ^n+l“cn+l  =  QW(g-- 5j] 

As  the  term  on  the  right  hand  side  is  less  than  one,  it  is  clear  that  tn  and 
Cj^  must  ultimately  become  equal. 

The  specific  implication  of  equation  10  is  that 

ID  (e0-t0)  d-/(g- 5 ))" 

where  c ^  and  are  the  initial  levels  of  confidence  and  trust.  But  then  it 
follows  that  tn  =  cn-(cQ-t0)  (1-^  (g-.  5))?  which  can  be  substituted  into 
equation  10,  giving 

12>  cn+l  =  Cn+X  (cn-‘  5(co"to))  (l~/  <g"-  5))n  (g“cn)- 

This  equation  is  non-linear  with  a  variable  coefficient  so  that  explicit 
solution  is  difficult.  However,  it  can  be  solved  for  the  asymptotic  value  c. 
The  roots  of  the  equation  £n+l“cn  =  0  occur  at  £n  =  .  5  and  £  =  j*.  The 
latter  root  is  assumed  to  represent  the  typical  outcome:  both  confidence 
and  trust  become  equal  to  the  rate  of  agreement. 

Composite  judgments 


The  other  aspect  of  the  relation  between  c  and  t  is  their  interaction 
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on  a  single  trial.  Given  a  specified  level  of  c  and  t  on  a  particular  trial 
and  a  certain  internal  sensation,  $  or  A,  how  will  the  subject  utilize  a 
supplementary  partner  judgment  that  may  agree  or  disagree  with  his  own? 

The  appropriate  formula  for  combining  the  values  for  c  and  t  in 
order  to  estimate  composite  confidence,  v,  would  seem  to  have  the  fol¬ 
lowing  properties: 

(1)  The  direction  of  the  final  judgment,  Y  or  N,  should  be  the  same  as 
that  of  the  judgment  associated  with  the  greater  of  c  or  t  . 

(2)  v  should  be  greater  than  c  if  the  subject  and  P  agree,  but  less  if  they 
disagree . 

(3)  If  either  c  or  t  is  at  .  5,  it  will  not  affect  v  at  all. 

(4)  As  either  c  or  t  approaches  1.  0,  so  will  v. 

These  criteria  suggest  that  c  and  t  should  be  combined  by  the  standard 
Bayes  formula  for  calculation  of  inverse  probability.  Thus  if  the  subjects 
own  initial  judgment  is  that  the  signal  is  present,  and  the  partner  agrees 

13)  vj  =  ct/(ct+cT)  =  ct/g 
If  they  disagree,  the  expression  is 

14)  v2  =  cf/(ct+ct)  +  ct/g 

assuming  that  c  is  greater  than  t. 

Verbally,  equation  13  may  be  paraphrased  from  the  subjects 
standpoint  as,  Mthe  conditional  probability  that  the  joint  judgment  (Y  or  N) 
is  correct,  if  both  P  and  I  think  it  is  correct,  is  the  probability  of  our 
agreeing  on  a  correct  judgment  divided  by  the  total  probability  of  agree¬ 
ment.  ,f 

Empirical  derivation  of  confidence  and  trust  indices 


In  practice  it  may  not  be  possible  to  measure  £  and  £  directly,  and 
will  be  necessary  to  infer  those  quantities  from  observed  values  of  the 
composite  confidence,  v.  This  section  will  derive  equations  from  which 
the  estimate  may  be  made. 

Suppose  that  the  subject* s  composite  confidence  on  those  trials  in 
which  he  is  in  agreement  with  P  is  vj  and  that  the  composite  confidence  is 
£2  on  trials  in  which  the  subject  and  P  disagree.  Then  from  earlier  re¬ 
sults,  it  is  assumed  that 

15)  v1  =  ct/ct+^r 

16)  v2  =  ct/ct+ct 
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15  is  solved  to  obtain  an  expression  for  t  ,  i.  e.  , 


17)  t  =  cvj/ (cvj+cvj) 
from  which  also 

18)  t  =  cvj  /(cvj+cvj) 

These  values  are  then  substituted  in  the  second  equation,  giving 

19)  cv2  .  cvf/(Cvi+.cv1)  =  cv2  .  cvj/  (cvj+Cv^ 

2  -2 

19a)  c  ^2^1  =  c  vlv2 

This  can  be  solved  in  turn  to  provide  a  quadratic  expression  for  £,  i.  e.  , 

20)  c  =  v  jv  2+\ /vqv  £Viv  2 

vi+v2-l 

and  t  can  thus  be  found  by  substitution  in  one  of  the  earlier  equations. 

To  illustrate,  suppose  that  vj  =  .  80  and  v2  =  •  60.  Then 

c  =  .  48+ v  /  0384  =  .  48 +.  196  =  1.  69  or  .  71 

TTo  .  40 

The  latter  value  is  obviously  the  appropriate  one.  Then,  from  equation 
17, 


t  =  .  29  x  .  80/(.  29  x  .  80  +  .  71  x  .  20)  =  .62 
Putting  these  values  back  in  equation  15  and  16  gives 

vi  =  .  71  x  .  6 2/(.  71  x  .  62  +  .  29  x  .  38)  =  .  80  as  obtained 
v2  =  .  71  x  .  38/(.  71  x  .  38  +  .  62  x  .  29)  =  .  60  as  obtained 
Measurement  of  confidence 

In  order  to  test  the  consequences  of  these  formulations,  it  is  ne¬ 
cessary  to  have  a  sensitive  and  valid  measure  of  confidence  and  associat¬ 
ed  constructs.  A  direct  introspective  report  of  confidence  --  e.g.  simply 
asking  the  subject  "how  sure  he  is"  of  his  judgment  --  has  obvious  draw¬ 
backs.  In  particular,  the  results  obtained  under  these  instructions  must 
depend  upon  whether  the  subject  interprets  his  objective  as  maximizing 
the  stated  confidence  on  successful  trials  or  minimizing  the  stated 
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confidence  on  unsuccessful  trials. 


Even  the  device  of  rewarding  the  subject  for  the  realism  of  his 
estimated  confidence  is  of  questionable  value.  The  basic  difficulty  here 
is  that  if  the  reward  is  based  on  the  overall  agreement  between  mean  con¬ 
fidence  and  mean  accuracy  it  does  not  provide  an  incentive  for  accurate 
estimation  of  confidence  on  each  trial.  The  subjects  best  strategy  is  to 
report  a  fixed  confidence  value  that  accords  with  his  rate  of  success,  ig¬ 
noring  trial-to -trial  fluctuations  in  subjective  confidence. 

The  measure  here  suggested,  and  incorporated  in  the  empirical  study, 
is  based  on  the  more  general  conceptual  framework  cited  earlier  (Roby, 

1962).  This  measure  offers  the  subject  a  graded  family  of  bets  with  pay¬ 
offs  dependent  upon  stated  confidence.  Specifically,  the  payoff  for  a  cor¬ 
rect  judgment  is  proportional  to  c /  J  c^+c^  and  the  payoff  for  an  incorrect 
judgment  is  c/y/c^+C^  .  Thus  the  potential  gain  on  a  successful  trial  in¬ 
creases  as  the  stated  c  increases,  but  the  gain  for  an  unsuccessful  trial 
decreases.  Because  the  denominator  increases  from  a  minimum  at  c  = 
c  =  .  5  to  a  maximum  value  at  c  =  1,  the  subject  is  penalized  for  over¬ 
stating  his  confidence. 

As  a  numerical  illustration,  suppose  that  the  subjects  actual  confi¬ 
dence  is  .75  that  the  signal  is  present.  The  corresponding  optimal  bet  is 
(.  75,  .  25)  with  payoffs  of  .  75/  \/ 752+.  252  =  .  948  and  .  25/v/.  752+.  25^  =  .  316 
if  the  signal  is  present  or  absent,  respectively.  The  expected  payoff  is 
.  75  x  .  948  +  .  25  x  .  316  =  .  790.  If  the  subject  selects  the  bet  corres¬ 
ponding  to  (.  80,  .  20)  then  the  expected  payoff  is  (.  7  5  x  .  80  +  .  25  x  .  2^/ 

.  824  =  .7888.  If  he  selects  the  more  conservative  bet  corresponding  to 
(.  70,  .  30)  the  expected  payoff  is  (.  75  x  .  70  +  .  25  x  .  30)/.  761  =  .  7884. 
Although  the  loss  is  not  great  for  a  discrepancy  of  this  magnitude,  it 
increases  rather  sharply  for  more  inappropriate  bets. 


Experimental  Procedure 

The  judgmental  task  on  which  confidence  measures  were  obtained 
entailed  a  discrimination  between  a  standard  angle  of  90°,  and  a  comparison 
angle  of  less  than  90°.  The  angles  were  printed  on  cards  and  displayed 
in  an  illuminated  aperture.  On  each  trial,  the  subjects  were  shown  a 
card  which  might  contain  either  the  90°  standard  or  the  comparison  angle. 
They  were  required  to  report  their  judgment  within  a  fixed  time  interval. 

The  response  consisted  in  setting  a  movable  pointer  along  a  linear 
scale  from  +10,  corresponding  to  virtual  certainty  that  the  display  was  the 
90°  angle,  to  -10  representing  virtual  certainty  that  it  was  not  the  90° 
angle.  The  subject  indicated  that  he  had  made  his  final  adjustment  for  a 
trial  by  pressing  a  test  button.  Under  feedback  conditions  this  test  re¬ 
sulted  in  an  indication  to  the  subject  of  his  earnings  on  that  trial.  Under 
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the  no-feedback  condition,  the  display  meter,  which  showed  trial-to -trial 
earnings,  was  covered. 

For  future  reference,  Table  1  contains  a  summary  of  the  various 
response  measures  and  corresponding  payoffs.  The  "recorded  position" 
was  the  pointer  position  that  was  indicated  to  the  experimenter  and  became 
the  basic  raw  data  of  the  study.  The  "dial  position"  was  the  setting  on  the 
subjects  response  panel;  unlike  the  recorded  position,  it  was  explicitly 
directional  in  order  to  emphasize  the  amplitude  of  the  response  away  from 
the  neutral  position.  The  "angle  equivalent"  column  divides  the  20  pointer 
positions  into  equal  intervals  of  4.74°  spanning  the  range  from  0°  to  90°, 
with  45  corresponding  to  the  neutral  position. 

The  two  "payoff"  columns  are  symmetrical  with  respect  to  the 
neutral  position.  The  numbers  appearing  in  these  columns  are  obtained 
by  taking  sines  and  cosines,  respectively,  of  the  angle  equivalents,  sub¬ 
tracting  .707,  the  sine  or  cosine  of  45°,  and  multiplying  by  a  factor  of  ten. 
The  "probability  equivalents,"  finally,  are  equal  interval  divisions  of  the 
20  steps  from  0-100%  probability.  These  values  are  used  for  all  compari¬ 
sons  with  accuracy,  and  for  other  computational  purposes. 

Experimental  design 

The  overall  experimental  design  is  given  in  Table  2.  The  main 
variables  reflected  in  this  design  are: 

a)  Feedback  -  one  group  of  16  subjects  was  run  with  feedback  on 
all  trials:  that  is,  after  each  judgment  they  were  given  an  indication  of 
their  earnings  on  that  trial.  A  second  group  of  16  subjects  was  run  in  an 
experimental  design  that  was  identical  except  that  they  were  not  given  feed¬ 
back. 

b)  Blocks  of  trials  -  each  subject  had  5  blocks  of  50  trials  each, 
all  completed  in  a  single  experimental  session.  The  first  block,  for  all 
subjects,  was  without  any  hypothetical  partner;  in  the  four  succeeding 
blocks  of  trials,  they  were  given  a  supplementary  partner  report  or  "pony" 
on  each  trial. 

c)  Signal  difficulty  -  two  levels  of  signal  difficulty  were  intro¬ 
duced  by  setting  the  size  of  the  comparison  angle  at  89°  for  the  hard  dis¬ 
crimination,  and  87°  for  the  easy  discrimination.  Earlier  results  had  in¬ 
dicated  that  correct  detection  in  the  former  case  occurred  about  65%  of 
the  time  and  in  the  latter  case  about  85%  of  the  time. 

d)  Pony  reliability  -  The  subjects  were  told  that  the  supplemen¬ 
tary  reports  they  would  receive  were  not  completely  reliable  and  were  ad¬ 
vised  of  the  actual  reliability  levels  --  64%  and  84%  respectively  --  for 
two  reliability  levels.  These  levels  were  selected  to  correspond  to  the 
signal  difficulty  values,  modified  only  for  rounding  errors.  There  were 
exactly  16  correct  answers  in  each  set  of  25  trials  for  the  64%  pony,  and 
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exactly  21  correct  answers  in  25  trials  for  the  84%  pony. 

The  chief  be  tween-subject  variable  (other  than  feedback)  was  the 
order  in  which  they  had  various  combinations  of  signal  difficulty  and  pony. 
The  basic  design  is  a  four -by-four  latin  square  in  which  the  four  signal- 
pony  combinations  occur  in  each  serial  position.  In  addition,  however, 
within  each  of  the  subgroups  of  four  subjects  who  had  the  same  sequence, 
two  subjects  had  the  hard  discrimination  in  the  initial  trial  block,  and 
two  subjects  had  the  easy  discrimination.  These  trial  blocks  of  course 
occurred  before  the  pony  was  introduced. 

e)  List  -  the  blocks  of  50  trials  consisted  of  25  actual  signal 
trials  and  25  actual  no-signal  trials.  The  order  of  signal-no -signal  oc¬ 
currence  was  randomized  16  times  to  generate  16  distinct  lists.  Within 
the  main  sequence  of  trial  blocks  (i.e.  ,  the  last  four)  each  subset  of  four 
subjects  had  all  of  the  16  lists,  and  the  subjects  did  not  have  the  same  list 
twice. 

Procedure 

Subjects  were  male  and  female  Tufts  undergraduates.  They  were 
paid  at  an  hourly  rate  for  participating  in  the  study  and  also  were  told 
that  a  $5.  00  reward  would  be  given  for  best  performance. 

The  instructions  (Appendix)  explained  the  judgmental  task  and  the  re¬ 
ward  or  payoff  system.  They  were  given  to  the  subjects  to  read,  and  E 
answered  relevant  questions. 

At  the  beginning  of  the  experiment,  and  before  each  of  the  remaining 
four  blocks  of  50  trials,  the  subject  had  a  10  trial  procedural  "warmup.  11 
On  these  10  trials,  the  signal  and  pony  were  presented  as  for  regular 
trials  and  the  subject  had  an  opportunity  to  respond  as  he  would  on  actual 
trials.  These  data  were  recorded  but  will  not  be  reported. 

After  the  subject  had  made  each  judgment  by  positioning  the  response 
switch  and  testing,  he  also  wrote  down  his  response  and  the  resulting  pay¬ 
off  (under  feedback  conditions).  The  payoffs  were  totalled  by  the  subjects 
after  25  trials. 

Subjects  were  also  given  a  brief  questionnaire  asking  how  much  they 
used  the  pony.  These  data  are  not  analyzed  in  the  present  report. 
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Re  suits 


Confidence  and  accuracy  on  pre-pony  trials 


The  first  set  of  data  describe  the  performance  of  subjects  on  the 
50  trials  prior  to  introduction  of  the  pony.  On  these  trials  half  of  the  sub¬ 
jects  had  the  87°  (easy)  discrimination  and  half  had  the  89°  (difficult)  dis¬ 
crimination.  The  chief  results  are  summarized  in  Table  3.  They  are 
classified  in  terms  of  the  discrimination  angle.  Trials  are  broken  down 
into  first  25  and  second  25.  Feedback  and  non-feedback  subjects  are  sepa¬ 
rated;  and  the  confidence  measures  are  separately  averaged  for  those 
trials  on  which  the  subjects  were  correct  and  those  trials  on  which  the 
subjects  were  incorrect.  The  following  conclusions  appear  to  be  justified: 

1.  There  is  no  appreciable  improvement  in  detection  in  the  first  to  the 
second  sets  of  25  trials.  This  holds  for  both  feedback  and  non-feedback 
subjects. 

2.  Feedback  subjects  do  not  perform  better  than  non-feedback  subjects 
in  terms  of  accuracy. 

3.  Subjects  perform  better  on  the  87°  signal  than  the  89°  signal.  The 
pooled  accuracy  for  the  former  is  .751  and  the  latter  is  .624.  These  rates 
are  slightly  lower  than  were  obtained  on  pilot  studies  but  still  well  enough 
separated  to  serve  the  main  purposes  of  the  study. 

4.  Confidence  scores  are  higher  for  the  feedback  subjects  than  for  the 
non-feedback  subjects  for  the  easy  angles  but  not  for  the  hard  angle  dis¬ 
crimination. 

5.  Confidence  is  uniformly  higher  for  the  87°  angle  than  for  the  89°  angle 
for  the  FB  subjects.  This  does  not  hold  however  for  the  FB  subjects. 

6.  The  confidence  scores  are  higher  when  subjects  are  correct  than  when 
the  subjects  are  incorrect. 

This  last  result  is  shown  in  more  detail  in  a  slightly  different  way 
in  Table  4.  This  table  presents  the  relation  between  accuracy  and  each  of 
the  dial  settings  corresponding  to  a  confidence  level.  The  first  column 
gives  the  dial  setting  from  1  to  10  which  the  subjects  adjusted  on  each  trial. 
The  next  four  columns  give  the  corresponding  accuracy  proportions  at  that 
setting  followed  by  the  mean  accuracy  across  all  subjects  and  all  angles, 
and  the  final  column  gives  the  equivalent  confidence  level,  that  is  the  ac¬ 
curacy  setting  transformed  into  the  corresponding  proportion. 

Although  the  relation  between  accuracy  and  confidence  is  monotonic 
within  sampling  error  (the  product-moment  correlation  of  the  unweighted 


Some  of  the  statistical  analyses  reported  here  were  completed  at 
the  Massachusetts  Institute  of  Technology  Computations  Laboratory. 
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scores  is  .  819),  it  is  clear  that  the  confidence  tends  to  be  exaggerated  at 
the  higher  level.  As  evidence,  the  regression  between  confidence  and  ac¬ 
curacy  is  given  by  the  formula  c  =  .  062  +  1.28a.  As  noted  earlier  FB 
subjects  tend  to  be  more  overconfident  at  the  87°level  and  there  is  a 
slight  tendancy  for  the  FB  subjects  to  be  more  overconfident  at  the  89° 
level.  These  data  and  the  result  noted  above  (that  subjects  are  more 
confident  when  they  are  correct)  do  not  show  clearly  whether  the  effect 
is  due  to  within-subject  or  between-subject  differences;  that  is,  they  do 
not  show  whether  the  individual  subject  is  more  likely  to  give  a  high  con¬ 
fidence  response  on  those  trials  on  which  he  is  correct  or  whether  it  is 
simply  true  that  more  accurate  subjects  use  higher  confidence  responses 
across  the  board.  Later  results  give  sharper  but  still  not  conclusive 
evidence  on  this  point. 

Accuracy  and  confidence  with  pony  available 


Table  5  contains  the  accuracy  scores  under  the  various  experimen¬ 
tal  conditions  for  the  final  four  sessions  after  the  pony  was  introduced. 

The  scores  for  the  first  and  second  25  trials  showed  no  differences,  as 
before,  and  have  been  pooled  in  this  table.  The  accuracies  are  computed 
separately  for  those  trials  in  which  the  pony  was  correct  and  the  trials 
on  which  the  pony  was  incorrect,  and  the  weighted  means  over  both  cor¬ 
rect  and  incorrect  trials  are  also  presented. 

For  these  data  as  before  there  are  no  differences  between  the  FB 
and  FB  subjects,  and  the  accuracies  are  clearly  greater  for  the  87°  dis¬ 
crimination  than  for  the  89°  discrimination.  The  interesting  result  here 
is  the  interaction  effect  between  pony  reliability  and  correctness.  For 
the  more  reliable  84%  pony,  the  subjects1  composite  judgments  are  more 
correct  when  the  pony  is  correct,  but  less  accurate  when  the  pony  is 
wrong.  The  natural  interpretation  of  this  is  that  the  subjects  tended  to 
depend  on  the  reliable  pony  more  than  the  less  feliable  pony.  At  least  for 
the  easy  discrimination,  however,  the  net  result  is  that  the  subjects  do 
little  better  when  they  have  the  pony  answer  to  lean  on.  As  the  judgment 
becomes  more  difficult,  the  pony  becomes  more  helpful;  but  it  should  be 
noted  that  only  in  the  case  of  the  easy  angle  and  the  64%  pony  does  the 
subject  do  better  than  he  would  by  following  the  pony  exactly. 

In  Table  6  the  mean  confidence  scores  are  given  for  each  of  the 
eight  main  experimental  conditions  and,  within  conditions,  for  each  of  the 
four  event  conditions  defined  by  the  correctness  of  the  subject  and  agree¬ 
ment  with  the  pony.  As  before,  the  confidence  values  have  been  converted 
to  probability  for  purposes  of  comparison  and  as  before  they  tend  to  be 
high  relative  to  the  accuracy  under  corresponding  conditions.  The  almost 
uniform  result  shown  in  this  table  is  that  subjects  are  more  confident, 
whether  they  are  correct  or  incorrect,  when  they  agree  with  the  pony. 

The  sole  discrepancy  is  for  the  easy  87°  discrimination  with  the  less  re- 
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liable  64%  pony.  It  also  tends  to  be  true  for  the  87°  discrimination  that 
subjects  are  more  confident  when  they  are  correct  than  when  they  are  in¬ 
correct.  This  last  result  will  be  examined  again  in  greater  detail  in  a 
later  section. 

Derived  c  and  t  indices 


These  results  are  put  in  much  sharper  focus  if  the  composite  con¬ 
fidence  scores  are  used  to  derive  estimates  of  confidence  in  the  subjects' 
judgment  (independent  of  the  pony's  answer  )  and  trust  in  the  pony  as  dis¬ 
cussed  in  the  introductory  section.  The  values  for  v^  are  the  mean  pro¬ 
babilities  ascribed  to  the  signal  being  present  when  it  is  present  and  the 
pony  is  correct  or  to  the  signal  not  being  present  when  it  is  not  present 
and  the  pony  is  correct.  The  values  of  are  the  corresponding  probabili¬ 
ties  when  the  pony's  response  is  incorrect;  thus,  for  example  a  compos¬ 
ite  confidence  of  .70  attached  to  an  incorrect  judgment  is  treated  as  a 
"correct"  judgment  with  the  composite  confidence  of  ,  30.  Results  given 
in  Table  7  are  divided  into  estimates  for  the  FB  and  FB  conditions  but 
also  subjects  within  both  FB  and  FB  conditions  have  been  divided  into  two 
groups.  The  top  set  of  means  in  each  case  refer  to  subjects  who  tended 
to  distribute  their  confidence  values  in  the  uni-modal  way  assumed  in  the 
computational  scheme.  The  lower  means  in  each  case  describe  the  con¬ 
fidence  scores  for  the  subjects  who  tended  to  "over-use"  the  extreme 
confidence  values.  Thus  the  top  scores  are  perhaps  on  somewhat  firmer 
ground.  However  the  conclusions  listed  below  follow  for  both  sets  of 
scores:^ 

1.  Confidence  is  greater  for  the  87°  judgment  than  for  the  89°  judgment 
(This  result  holds  for  31  of  32  sign-test  comparisons  in  the  original  data.  ) 

2.  The  trust  is  greater  for  the  84%  pony  than  for  the  64%  pony. 

3.  Confidence  exceeds  trust  for  the  87°  judgment  (in  27  of  32  sign-test 
comparisons)  and  is  lower  than  trust  for  the  89°  judgment  (in  24  of  32 
sign-test  comparisons). 

These  results  are  all  in  accordance  with  common  sense  expecta¬ 
tions  and  tend  to  vindicate  the  rather  tenuous  chain  of  inference  upon  which 
these  indices  are  based. 


^This  procedure  is  based  on  the  assumption  that  the  internal  signal 
associated  with  the  objective  presence  of  the  signal  has  a  continuous  uni- 
modal  distribution  about  a  positive  value  but  running  through  the  50%  point. 
This  choice  of  assumptions  was  dictated  by  parsimony  and  computational 
convenience.  Subsequent  analysis  indicates  that  it  results  in  an  under¬ 
estimate  of  both  c  and  t  values,  but  that  the  relative  magnitudes  are  main¬ 
tained  without  great  distortion. 
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Variance  estimates  for  unfolded  confidence  scores 


Although  it  did  not  seem  judicious  to  depend  heavily  on  parametric 
statistics  for  this  exploratory  study,  an  analysis  of  variance  was  perform¬ 
ed  in  an  effort  to  determine  the  relative  magnitudes  of  separate  sources^ 
of  variation.  Table  8  gives  certain  sums  of  squares  extracted  from  a  2 
factorial  analysis  of  variance  of  the  composite  confidence  scores  classi¬ 
fied  in  terms  of  all  experimental  and  event  conditions.  The  scores  con¬ 
sidered  here  are  the  MunfoldedM  or  full  range  scores  running  from  1  to  20 
which  in  fact  reflect  both  confidence  and  accuracy.  These  results  and  the 
associated  mean  differences  (which  will  be  converted  top  values)  support 
the  following  conclusions: 

1.  There  is  no  bias  in  favor  of  signal  presence  or  absence;  the  mean 
confidence  over  all  events  and  conditions  is  almost  exactly  50%. 

2.  All  experimental  treatments  combined  account  for  about  half  the  vari¬ 
ance  in  the  composite  judgment.  Both  subject  and  treatment  effects  are 
highly  significant  as  compared  with  their  interaction. 

3.  There  is  a  slight  but  significant  tendency  (indicated  by  item  c  in  the 
table)  for  a  composite  judgment  in  favor  of  signal  presence  under  the  diffi¬ 
cult  discrimination. 

4.  Signal  presence  accounts  for  about  half  of  the  total  variation  due  to 
treatments;  of  course,  with  perfect  discrimination  this  would  be  the  sole 
source  of  variation.  The  mean  confidence  scores  with  signal  present  and 
absent  are  .  642  and  .  358  respectively. 

5.  The  second  largest  source  of  variation  is  the  interaction  between  sig¬ 
nal  difficulty  and  the  presence  or  absence  of  the  signal  on  a  particular 
trial.  With  the  difficult  89°  test  angle  and  the  signal  present  the  mean 
confidence  is  .  57  3;  with  the  easy  87°  test  angle  and  the  signal  present  it 
is  .710. 

6.  Availability  of  feedback  interacts  with  both  the  signal  and  pony  events 
as  shown  by  the  interactions  a  x  f  and  a  x  g.  These  results  are  summar¬ 
ized  by  the  statement  that  FB  subjects  are  influenced  somewhat  more  by 
the  signal  and  somewhat  less  by  the  pony. 

7.  The  significant  interaction  (cxg)  between  the  test  angle  and  the  pony 
event  indicates  the  greater  influence  of  the  pony  when  the  discrimination 
is  difficult. 

8.  The  interaction  (dxg)  between  the  pony  reliability  and  pony  event  (That 
is  ,fyesn  or  ffnoM  pony  answer  on  a  particular  trial)  is,  as  would  be  expect¬ 
ed,  reflected  in  a  greater  influence  of  the  84%  than  of  the  64%  pony. 

Thus  the  analysis  of  variance  results,  here  treated  as  only  descrip¬ 
tive  material,  do  tend  to  corroborate  the  results  obtained  on  the  more  con¬ 
servative  sign-tests  presented  earlier.  They  also  give  some  preliminary 
indication  of  the  relative  magnitudes  of  the  effects  in  quantitative  terms. 
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Confidence  and  Accuracy 


This  section  will  return  to  the  relation  between  confidence  and  ac¬ 
curacy  and  in  particular  to  the  within-subject  aspect  of  this;  that  is,  not 
just  whether  the  accurate  subjects  tend  to  be  more  confident,  but  also 
whether  individual  subjects  tend  to  be  more  confident  on  trials  upon  which 
they  are  correct.  In  order  to  investigate  this,  the  mean  confidence  scores 
(folded)  have  been  computed  for  each  of  the  subjects  under  each  of  the 
four  experimental  conditions  and  each  of  the  four  event  conditions  defined 
by  the  subjects  correctness  and  the  subject  and  pony  correctness.  The 
data  here  considered  as  before  are  the  sign  tests,  in  this  case  the  signs 
of  the  differences  between  the  means  when  the  subject  is  in  fact  correct 
and  the  mean  confidence  when  the  subject  is  incorrect,  controlling  for  both 
experimental  and  event  conditions. 

As  far  as  it  goes,  the  evidence  is  quite  clear  that  individual  sub¬ 
jects  do  make  more  confident  judgments  when  they  are  correct.  In  a  total 
of  384  sign  test  comparisons  231  or  60%  are  in  favor  of  higher  confidence 
on  correct  trials  all  other  conditions  being  identical.  A  more  detailed 
analysis  shows,  however,  that  this  is  due  almost  entirely  to  those  trials 
on  which  the  pony  is  also  correct.  Under  those  conditions  the  confidence 
is  higher  for  the  correct  judgment  in  73%  of  all  cases  whereas  it  is  higher 
on  only  44%  of  the  cases  in  which  the  pony  is  incorrect.  This  leaves  the 
question  of  individual  reaction  to  correctness  still  up  in  the  air. 

It  might  be  noted  incidentally  that  there  is  no  evidence  in  the  data 
described  above  for  any  effect  due  to  feedback  or  to  acquisition  over  suc¬ 
cessive  sessions;  that  is,  there  is  no  tendency  for  subjects  to  become 
more  realistic  in  their  estimates  due  to  either  of  these  learning  conditions. 

Taking  one  more  step  along  this  line,  the  final  result  concerns  the 
relation  between  the  actual  point  gains  of  subjects  and  the  score  they 
might  have  achieved  with  a  more  realistic  appraisal  of  accuracy  of  judg¬ 
ment.  The  latter  value  is  computed  by  assuming  that  the  ''ideal  subject" 
would  on  each  trial  use  the  confidence  level  corresponding  to  his  mean 
accuracy  over  all  trials  under  that  particular  experimental  condition. 

The  results  in  terms  of  discrepancies  bear  out  the  following  points: 

1.  In  general  subjects  do  not  score  as  well  as  they  might  by  selecting  a 
uniform  confidence  for  all  trials  in  accordance  with  their  long  run  ac¬ 
curacy.  Negative  discrepancy  scores  are  obtained  by  24  of  the  32  subjects 
and  in  83  of  the  128  sessions.  Moreover  the  negative  discrepancies  are  in 
general  considerably  larger  than  the  positive  ones. 

2.  There  is  a  correlation  between  negative  discrepancy  and  low  accuracy; 
that  is,  subjects  tend  to  lose  potential  points  under  poor  performance. 

At  the  same  time  it  should  be  mentioned  that  the  mean  loss  per  trial  due 
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to  unrealistic  confidence  level  is  only  .  20  points  on  the  10  point  scale; 
thus,  it  appears  that  subjects  do  compensate  by  high  confidence  bets  on 
trials  on  which  they  are  in  fact  correct. 

3.  There  is  no  relation  in  these  data  with  either  feedback  or  with  train¬ 
ing;  that  is,  there  is  no  evidence  of  learning  to  become  more  realistic 
over  trials. 

It  must  then  be  concluded  that  these  data  do  not  provide  a  firm  ans¬ 
wer  to  the  question  of  whether  the  subjects  are  sensitive  to  fluctuations  in 
soundness  of  judgment  from  trial  to  trial.  Some  additional  evidence  might 
have  been  obtained  from  the  pre-pony  trials  but  it  seemed  better  advised 
to  reserve  this  question  for  a  later  study. 


Summary 


From  a  methodological  standpoint  the  results  of  this  report  are 
distinctly  encouraging.  The  experimental  manipulations  for  the  most  part 
operate  as  expected,  and  the  response  measures  appear  to  be  sensitive  to 
experimental  conditions. 

At  the  same  time,  it  is  clear  that  several  further  methodological 
improvements  are  required  before  definitive  results  can  be  obtained. 

1.  Although  the  subjects  in  general  appeared  to  use  the  'betting1 
scheme  in  a  rational  (gain  maximizing)  way,  many  subjects  still  used  the 
extreme  scores  with  inordinate  frequency.  In  order  for  the  measurement 
procedure  here  employed  to  be  fully  successful  this  "sporting"  reaction 
must  be  further  minimized.  In  research  subsequent  to  that  reported  here, 
an  attempt  has  been  made  to  achieve  this  by  identifying  the  confidence 
judgments  directly  with  probability  rather  than  using  the  10  point  scale. 

If  this  and  related  instructional  devices  are  unsuccessful,  the  rather  un¬ 
palatable  alternative  would  be  to  discard  or  isolate  non-conforming  sub¬ 
jects  on  the  basis  of  score  distributions 

2.  At  no  point  in  the  above  results  does  the  effect  of  the  avail¬ 
ability  of  trial-to-trial  feedback  seem  to  be  as  pronounced  as  one  would 
expect.  There  are  few  major  differences  between  FB  and  FB  subjects, 
and  little  evidence  of  acquisition  on  the  part  of  the  latter  subjects.  One 
suggested  procedural  modification  would  be  to  provide  FB  subjects  with 
a  cumulative  record  of  earnings  as  well  as  feedback  on  each  trial.  One 
of  the  followup  studies  mentioned  below  should  shed  more  light  on  the  de¬ 
sirability  of  such  modification. 
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Appendix 


Instructions 


You  will  be  shown  angles,  some  of  which  are  90°,  i.e.  right  angles, 
and  some  of  which  are  smaller.  Your  task  will  be  to  judge  whether  the 
angle  you  see  on  any  given  trial  is  a  right  angle.  In  addition,  we  want  to 
know  how  sure  you  are  of  your  decision.  Therefore,  we  will  allow  you  to 
bet  up  to  10  points  on  your  judgments. 

The  angles  will  appear  behind  this  glass  frame  when  the  light  goes 
on.  You  will  see  the  angle  for  half  a  second  and  then  have  approximately 
15  seconds  in  which  to  make  your  bet.  Use  the  pointer  on  the  scaled  panel 
in  front  of  you  to  indicate  the  bet.  Notice  that  the  scale  goes  from  a  maxi¬ 
mum  number  of  10  points  on  the  left  for  a  "no"  to  a  maximum  number  of 
10  points  on  the  right  for  a  "yes.  M  Do  not  use  the  zero  point  at  the  center 
since  this  means  that  you  are  unwilling  to  bet;  and  we  want  you  to  make 
a  bet,  however  small,  in  one  direction  or  the  other.  After  you  have  placed 
your  bet,  you  may  find  out  whether  you  have  won  or  lost  points.  To  do 
this,  push  the  button  to  the  right  of  the  pointer  scale  and  read  the  results 
from  the  meter  facing  you.  A  negative  value  indicates  a  loss;  and  a 
positive  one,  a  gain.  With  respect  to  payoffs  on  your  bets,  keep  in  mind 
that  as  the  bet  size  increases,  the  size  of  losses  increases  at  a  more  rapid 
rate  than  does  the  size  of  the  gains.  I  will  illustrate  this  to  you  when  you 
have  finished  reading  the  instructions.  Every  25  trials  you  will  be  given 
two  minutes  to  figure  out  your  earnings. 

Here  is  exactly  how  a  trial  will  proceed. 

1.  You  see  the  angle  for  half  a  second. 

2.  You  decide  what  you  want  to  bet  on  your  decision,  set  the  pointer  at 
the  location  corresponding  to  your  bet,  and  enter  the  amount  of  the 
bet  in  the  proper  column  on  the  sheet  in  front  of  you,  i.e.  ,  the  "Yes" 
or  the  "No"  column,  depending  on  your  decision. 

3.  Push  the  button  to  the  right  of  the  pointer,  indicating  to  me  that  you 
are  "testing.  "  Read  the  payoff.  Enter  the  payoff  in  the  proper 
column,  i.e.  "+  column"  if  you  have  won  points  and  "-column"  if 
you  have  lost  points.  Reset  the  pointer  to  zero.  Please  try  to  keep 
your  record  neat  and  easily  readable. 

You  will  see  5  sets  of  50  angles  each.  In  each  set  there  will  be 
25  right  angles  and  25  smaller  angles.  For  the  first  set  the  procedure 
will  be  exactly  as  outlined  above.  For  the  other  4  sets  you  will  have  ad¬ 
ditional  information  which  may  help  you  in  making  decisions.  I  have  a 
list  of  a  hypothetical  subject's  responses  to  the  same  sets  of  angles  I  will 
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be  showing  you,  presented  in  the  same  order  as  you  will  be  seeing  them. 
For  2  of  the  sets  he  responds  correctly  64%  of  the  time  and  for  the  other 
two,  84%  of  the  time.  Before  beginning  each  set  I  will  tell  you  what  this 
percentage  is;  and  before  each  trial  I  will  tell  you  what  the  hypothetical 
subjects  response  is  for  that  trial. 

Your  object  throughout  the  experiment  will  be  to  win  as  many 
points  as  possible.  The  person  who  gets  the  highest  number  of  points  will 
get  $5.  00  in  addition  to  what  he  earns  for  working  as  a  subject. 

Keep  in  mind  the  fact  that  there  is  no  patterned  order  of  presenta¬ 
tion  in  any  set  of  angles  and  that  for  each  set  of  50,  there  will  be  25  right 
angles  and  25  that  are  smaller.  For  each  set  the  smaller  angle  is  the  same 
size;  and  I  will  tell  you  before  beginning  the  set  exactly  what  the  size  of 
the  smaller  angle  will  be.  If  you  have  any  questions,  please  ask  them  be¬ 
fore  we  begin  the  experiment  because  after  that  we  will  be  working  fairly 
rapidly  and  without  interruption. 
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Table  1 


Correspondences  between  Confidence  Measures 
and  Payoff  Values 


Recorded 

Dial 

Angle 

Payoff 

Payo  ff 

Percent 

position 

position 

equivalent 

Yes 

No 

equivalent 

i 

10  no 

0 

-7.  07 

2.  93 

0 

2 

9  " 

4.  74 

-6.  14 

2  90 

5.  26 

3 

8  " 

9.  47 

-5.  42 

2.  79 

10.  52 

4 

7  » 

14.  21 

-4.  62 

2.  62 

15.  78 

5 

6  " 

18.  95 

-3.  81 

2.  38 

21.  04 

6 

5  " 

23.  68 

-3.  06 

2.  09 

26.  30 

7 

4  " 

28.  42 

-2.  31 

1.  80 

31.  56 

8 

3  " 

33.  16 

-1.60 

1.  30 

36.  82 

9 

2  » 

37.  89 

-0.  94 

0.  72 

42.  08 

10 

1  " 

42.  63 

-0.  30 

0.  28 

47.  34 

11 

1  yes 

47.  37 

0.  28 

-0.  30 

52.  60 

12 

2  » 

52.  10 

0.  72 

-0.  94 

57,  86 

13 

3  " 

56.  84 

1.  30 

-1. 60 

63.  12 

14 

4  " 

61.  58 

1.  80 

-2.  31 

68.  38 

15 

5  " 

66.  31 

2.  09 

-3.  06 

73.  64 

16 

6  " 

71.  05 

2.  38 

-3.  81 

78.  90 

17 

7  » 

75.  79 

2.  62 

-4.  62 

84.  16 

18 

8  " 

80.  52 

2.  79 

-5.  42 

89.  42 

19 

9  " 

85.  26 

2.  90 

-6.  14 

94.  68 

20 

10  " 

90.  00 

2.  93 

-7.  07 

100.  00 
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Table  2 


Experimental  Design 


BLOCK  OF  50  TRIALS 
1  2  3  4  5 

S  < 


Group  1 

1  2 

2  3 

4 

H 

E 

H  64% 

H  84% 

E  64% 

! 

E  84% 

5 

H 

3  6 

4  7 

2  * 

H  84% 

H  64% 

E  84% 

E  64% 

8 

E 

9 

H 

5  10 

H  P- 

E  64% 

E  84% 

H  64% 

H  84% 

6  11 

E 

12 

E 

13 

H 

7  14 

H  jr 

E  84% 

E  64% 

H  84% 

H  64% 

8  15 

E 

16 

E 

H:  89° 
E:  87° 
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Table  3 


Accuracy  and  Confidence  Scores  Under  Various  Experimental 
Conditions  Before  Introduction  of  Pony 


Angle 


1st  25  T rials 
Proportion  Confidence 


2nd  25  T rials 
Proportion  Confidence 


87 


FB  - 


FB  - 


FB  - 


89°  FB  - 


Correct 

77.  5 

81.  5 

7  3.  5 

83.  0 

Incorrect 

22.  5 

73.  6 

26.  5 

75.  3 

Pooled 

79.  7 

80.9 

Correct 

72.  5 

85.  4 

77.  0 

91.  3 

Incorrect 

27.  5 

81. 6 

23.  0 

82.  7 

Pooled 

84.  4 

89.  3 

Correct 

60.  0 

82.  1 

67.  5 

84.  1 

Incorrect 

40.  0 

80.  9 

22.  5 

80.  0 

Pooled 

81.6 

82.  8 

Correct 

65.  5 

79.  0 

56.  5 

83.  5 

Incorrect 

34.  5 

75.  2 

43.  5 

81.  2 

Pooled 

-  - 

77.7 

-  - 

82.  4 
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Table  4 


Detection  Accuracy  at  Various  Levels  of  Confidence 
Under  Differing  Experimental  Conditions 


00 

-vj 

o 

89° 

Subjects 1 

Equivalent 

Dial 

Mean 

Confidence 

Setting 

FB 

FB 

FB 

FB 

Accuracy 

Level 

1 

.  50 

o 

o 

.  40 

.  40 

.  44 

.  53 

2 

.  65 

.  55 

.62 

.69 

.64 

.  58 

3 

.  75 

.  60 

.67 

.  53 

.  61 

.  63 

4 

.69 

.78 

.  57 

.  64 

.63 

.  68 

5 

.  60 

.  67 

.  58 

.61 

.  58 

.  74 

6 

.67 

.  55 

.  58 

.  60 

.  59 

.  80 

7 

.  78 

.69 

.  63 

.  65 

.  69 

.  84 

8 

.  88 

.75 

.  61 

.  67 

.  78 

.  89 

9 

.  77 

.  74 

.  93 

.  52 

.  73 

.  95 

10 

.  86 

.  84 

.  68 

.  69 

.  77 

1.  00 

Mean 

Accuracy 

.  755 

.  747 

.  637 

.  610 

Mean 

confidence 

.  800 

.  868 

.  822 

.  786 
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Table  5 


Mean  Accuracy  of  Composite  Judgments  Under 
Various  Experimental  Conditions 


No  Feedback  Feedback 


Pony 

Correct 

Pony 

Incorrect 

Combined 

Pony 

Correct 

Pony 

Incorrect  Combined 

87° 

64% 

.  833 

.  775 

.  812 

.  800 

.  720 

.  771 

87° 

84% 

.  845 

.  656 

.  815 

.  843 

.  628 

.  808 

o 

O' 

CO 

64% 

.  645 

.494 

.  591 

.  641 

.  550 

.  608 

o 

O' 

00 

84% 

.  781 

.  312 

.  706 

.711 

.  463 

.  671 

Table  6 


Mean  Confidence  (Corrected  to  Probability)  Under 
Various  Experimental  and  Event  Conditions 


No 


Angle 

Pony 

Diffi- 

Relia- 

Agree  s 

culty 

bility 

With  Pony 

00 

o 

64% 

correct 

.  866 

incorrect 

.  777 

o 

l> 

00 

84% 

correct 

.  880 

incorrect 

.  805 

o 

o 

00 

64% 

correct 

.  814 

incorrect 

.  810 

89° 

84% 

correct 

.  863 

incorrect 

.  891 

edback  Feedback 


Disagrees 
With  Pony 

Agrees 
With  Pony 

Disagrees 
With  Pony 

.  824 

.  901 

.  894 

.  763 

.  827 

.  852 

.  829 

.  904 

.  880 

.  769 

.  871 

.  816 

.  778 

.  846 

.  830 

.  790 

.  858 

.  780 

.  776 

.  852 

.  840 

.  789 

.  834 

.  797 
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Table  7 


Mean  Estimated  Confidence  (c)  and  Trust  (t) 
Under  Various  Experimental  Conditions 


Signal 

Difficulty 

Pony 

Reliability 

Feedback 

Conditions 

No  Feedback 

Conditions 

c 

t 

c 

t 

89° 

64% 

.  549 

.  553 

.  542 

.  537 

.  594 

.  546 

.  545 

.  575 

o 

O' 

00 

84% 

.  554 

.  623 

.  506 

.  643 

.  577 

.  573 

.  429 

.  785 

87° 

64% 

.  692 

.  569 

.  647 

.  533 

.  792 

.  510 

.  820 

.  565 

87° 

84% 

.  699 

.  591 

.  643 

.  574 

.  787 

.  568 

.  783 

.  634 

I 
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Table  8 


Variance  Associated  With  Selected  Combinations 
of  Experimental  and  Event  Conditions 


Source 

df 

Sums  of  Squares 

F 

a)  feedback 

1 

.  02 

.  00 

b)  pretraining  angle 

1 

1.  95 

.  13 

c)  test  angle 

1 

77.  21 

5.  22 

d)  pony  reliability 

1 

1.  85 

.  12 

e)  trials 

1 

4.  36 

.  29 

f)  signal  presence 

1 

7300.  30 

493.  35 

g)  pony  answer 

1 

2284.  71 

154.  40 

a  x  f 

1 

83.  76 

5.  66 

a  x  g 

1 

107. 30 

7.  25 

b  x  g 

1 

125.  57 

8.49 

c  x  f 

1 

2541.  27 

171.  74 

C  X  g 

1 

183.  80 

12.  42 

d  X  g 

1 

411. 31 

27.  80 

All  treatments 

127 

14650.  00 

7.  80 

Subjects 

7 

256.  71 

2.  48 

Treatments  x  Ss 

889 

13154. 78 
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